Solution and Forecast Horizons for Infinite-Horizon Nonhomogeneous Markov Decision Processes

نویسندگان

  • Torpong Cheevaprawatdomrong
  • Irwin E. Schochetman
  • Robert L. Smith
  • Alfredo Garcia
چکیده

We address in this paper the challenge of solving a nonhomogeneous infinite horizon Markov Decision Process (MDP) problem. More precisely, we seek an algorithm that, when given a finite subset of the problem’s potentially infinite data set, delivers an optimal first period policy. Such an algorithm can thus recursively generate within a rolling horizon procedure an infinite horizon optimal solution to the original infinite horizon problem. However it can happen that for a given problem no such algorithm exists. In this case, it is impossible to solve the problem with a finite state machine. We say such problems fail to be well-posed. Under the assumption of increasing marginal returns in actions with respect to states and stochastically increasing states transitioned into with respect to actions, we provide an algorithm that is guaranteed to solve the corresponding nonhomogeneous MDP whenever that problem is well-posed. The algorithm proceeds by discovering in finite time a forecast horizon for which a optimal solution delivers an optimal first period policy to the infinite horizon problem. In particular, we show by construction the existence of a forecast horizon (and hence a solution horizon) for all such well-posed problems. We illustrate the theory and algorithms developed by solving the time-varying version of the classic asset selling problem. 1991 Mathematics Subject Classification. Primary 90C40 Secondary 90B15, 90C39.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Denumerable State Nonhomogeneous Markov Decision Processes

We consider denumerable state nonhomogeneous Markov decision processes and extend results from both denumerable state homogeneous and finite state nonhomogeneous problems. We show that, under weak ergodicity, accumulation points of finite horizon optima (termed algorithmic optima) are average cost optimal. We also establish the existence of solution horizons. Finally, an algorithm is presented ...

متن کامل

Forecast Horizons for a Class of Dynamic Games

In theory, a Markov perfect equilibrium of an infinite horizon, non-stationary dynamic game requires from players the ability to forecast an infinite amount of data. In this paper, we prove that early strategic decisions are effectively decoupled from the tail game, in non-stationary dynamic games with discounting and uniformly bounded rewards. This decoupling is formalized by the notion of a “...

متن کامل

A stochastic programming approach for planning horizons of infinite horizon capacity planning problems

Planning horizon is a key issue in production planning. Different from previous approaches based on Markov Decision Processes, we study the planning horizon of capacity planning problems within the framework of stochastic programming. We first consider an infinite horizon stochastic capacity planning model involving a single resource, linear cost structure, and discrete distributions for genera...

متن کامل

Conditions for the discovery of solution horizons

We present necessary and sufficient conditions for discrete infinite horizon optimization problems with unique solutions to be solvable. These problems can be equivalently viewed as the task of finding a shortest path in an infinite directed network. We provide general forward algorithms with stopping rules for their solution. The key condition required is that of weak reachability, which rough...

متن کامل

Average Optimality in Nonhomogeneous Infinite Horizon Markov Decision Processes

We consider a nonhomogeneous stochastic infinite horizon optimization problem whose objective is to minimize the overall average cost per-period of an infinite sequence of actions (average optimality). Optimal solutions to such problems will in general be non-stationary. Moreover, a solution which initially makes poor decisions, and then selects wisely thereafter, can be average optimal. Howeve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2007